0819 Most Common Word
docs
Summary
This section provides an overview of the problem, examples, and the approach used to solve it.
Problem Statement
The problem, "Most Common Word" (LeetCode 0819), requires finding the most frequently occurring word in a paragraph that is not included in a given list of banned words. The solution must handle punctuation, case insensitivity, and edge cases effectively.
Examples
Two examples illustrate the problem:
Input:
- Paragraph: "Bob hit a ball, the hit BALL flew far after it was hit."
- Banned:
["hit"] - Expected Output:
"ball"
Explanation: The word "hit" is banned. After excluding banned words, "ball" appears most frequently.
Input:
- Paragraph: "a."
- Banned: []
- Expected Output: "a"
Explanation: The only word present is "a", and there are no banned words.
Approach
The solution is implemented using Kusto Query Language (KQL). The steps are as follows:
- Tokenization:
- Split the paragraph into words using the
split()function. - Use
tolower()to convert words to lowercase. - Strip punctuation using
trim(@"\W", word).
- Split the paragraph into words using the
- Filtering:
- Exclude empty words.
- Use the
hasoperator to filter out words from the banned list.
- Counting:
- Group the words by
TestCaseIDand count occurrences usingsummarize count().
- Group the words by
- Selecting the most common word:
- Use the
arg_max()function to identify the word with the highest frequency for each test case.
- Use the
- Validation:
- Compare the result with the expected output to determine if the test case passes or fails.
- Output a table with the test case ID, the expected result, and a "Passed" or "Failed" status.
Prerequisites
Originally, I wanted to include the content here, but org-html-publish-to-html just hangs there…
#+include: "./00_include_kusto_prerequisites.org"
Solution
let TestCases = datatable(TestCaseID:int, paragraph:string, banned:dynamic, expected:string)
[
// TestCaseID, Paragraph, Banned Words, Expected Result
1, "Bob hit a ball, the hit BALL flew far after it was hit.", dynamic(["hit"]), "ball",
2, "a.", dynamic([]), "a"
];
//
TestCases
| mv-expand word = split(tostring(paragraph), ' ')
| project TestCaseID, CleanWord = tolower(trim(@"\W", tostring(word)))
| where CleanWord != ""
| join kind=inner TestCases on TestCaseID
| where banned !has CleanWord
| summarize CountByWord = count() by TestCaseID, CleanWord
| summarize arg_max(CountByWord, CleanWord) by TestCaseID
| join kind=leftouter TestCases on TestCaseID
| project
TestCaseID,
Expected = tolower(expected),
IsPassed = iif(CleanWord == tolower(expected), "Passed", "Failed")
| order by TestCaseID asc
Output:
| TestCaseID | Expected | IsPassed |
|---|---|---|
| 1 | ball | Passed |
| 2 | a | Passed |
What I have learned
dynamic works good with has or !has: where banned !has CleanWord
I've used this approach of first, producing the empty line strings in my javascript implementation: 2025-01-11 0819. Most Common Word dmytro.zharii.com but now I am not sure if this is the best way to solve it. It works, it is simple, but maybe there are some alternatives.
| mv-expand word = split(tostring(paragraph), ' ') | project TestCaseID, CleanWord = tolower(trim(@"\W", tostring(word))) | where CleanWord != ""
Maybe I should somehow separate the logical blocks at this query.